Skip to content

Conversation

nicoburns
Copy link
Contributor

@nicoburns nicoburns commented Jun 30, 2025

Replaces the match_token! procedural macro to be replaced with a series of simpler tag! macro_rules macros combined with a regular rust match expression.

Requires a new string-cache-codegen release:

The following code block:

//§ the-before-html-insertion-mode
InsertionMode::BeforeHtml => match_token!(token {
    Token::Characters(SplitStatus::NotSplit, text) => ProcessResult::SplitWhitespace(text),
    Token::Characters(SplitStatus::Whitespace, _) => ProcessResult::Done,
    Token::Comment(text) => self.append_comment_to_doc(text),

    tag @ <html> => {
        self.create_root(tag.attrs);
        self.mode.set(InsertionMode::BeforeHead);
        ProcessResult::Done
    }

    </head> </body> </html> </br> => else,

    tag @ </_> => self.unexpected(&tag),

    token => {
        self.create_root(vec!());
        ProcessResult::Reprocess(InsertionMode::BeforeHead, token)
    }
}),

would be replaced with:

// § the-before-html-insertion-mode
// <https://html.spec.whatwg.org/multipage/parsing.html#the-before-html-insertion-mode>
InsertionMode::BeforeHtml => {
    let anything_else = |token: Token| {
        self.create_root(vec![]);
        ProcessResult::Reprocess(InsertionMode::BeforeHead, token)
    };

    match token {
        Token::Comment(text) => self.append_comment_to_doc(text),

        Token::Characters(SplitStatus::NotSplit, text) => {
            ProcessResult::SplitWhitespace(text)
        },

        Token::Characters(SplitStatus::Whitespace, _) => ProcessResult::Done,
        Token::Tag(tag @ tag!(<html>)) => {
            self.create_root(tag.attrs);
            self.mode.set(InsertionMode::BeforeHead);
            ProcessResult::Done
        },

        Token::Tag(tag!(</head> | </body> | </html> | </br>)) => anything_else(token),
        Token::Tag(tag @ tag!(</>)) => self.unexpected(&tag),

        token => anything_else(token),
    }
},

@nicoburns nicoburns force-pushed the declarative_match_token branch 2 times, most recently from 762a26e to 1ead52f Compare June 30, 2025 12:50
Copy link
Contributor

@simonwuelker simonwuelker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea!

Having to write tag_token!(tag @ ["html"] | [/"body"]) is odd though.
Can't it be tag_token!(tag @ <html> | </body>) if the macro match arm is changed from (__inner:[/$tag:tt]) to something like (__inner:</$tag:ident>)?

You could convert the ident to a tt using the stringify macro and that would remove the need for quotes at the macro invocation site.

@nicoburns nicoburns force-pushed the declarative_match_token branch from 1ead52f to d22aff7 Compare September 4, 2025 20:47
@nicoburns
Copy link
Contributor Author

nicoburns commented Sep 4, 2025

@simonwuelker I managed to get <> instead of [] working. I also got bare <> and </> working. But I couldn't get rid of the quotes around the tag name.

The problem is order of evaluation. You can't do local_name!(stringify!($tag)) because local_name! gets passed the literal tokens stringify!($tag) rather the evaluation of that macro.

I think it could be made to work if we either made local_name take bare unquoted token or we create a const fn to create a const LocalName.

Signed-off-by: Nico Burns <[email protected]>
@nicoburns nicoburns force-pushed the declarative_match_token branch from d22aff7 to 5c39655 Compare September 4, 2025 21:03
Signed-off-by: Nico Burns <[email protected]>
@nicoburns nicoburns force-pushed the declarative_match_token branch 4 times, most recently from 05c70a9 to a8858a4 Compare September 5, 2025 02:02
@nicoburns
Copy link
Contributor Author

Ok, I've made this work. We can't use const fn until rust-lang/rust#76001 lands, but I got it working by making the string_cache macros accept idents for any atom that is entirely alphanumeric+underscores (servo/string-cache#296).

I've updated the example in PR description with the new syntax.

@nicoburns nicoburns force-pushed the declarative_match_token branch 2 times, most recently from 15c2fd8 to af5a120 Compare September 5, 2025 02:30
Signed-off-by: Nico Burns <[email protected]>
@nicoburns nicoburns force-pushed the declarative_match_token branch from af5a120 to 6e431bf Compare September 5, 2025 03:21
@nicoburns nicoburns requested a review from jdm September 5, 2025 03:47
@nicoburns
Copy link
Contributor Author

@simonwuelker @jdm (no urgency but:)

This is now fully working and is in a position where it could be implemeted. It would just be a case of going through the rules.rs file and updating every case to use the new syntax (this PR currently updates the first 2 cases to give an idea of what that would look like). So it would be good to get your views on whether this is a change we want to pursue.

The main difference are:

  • This would remove the need for the match_token proc macro and crate entirely, replacing it with the macro_rules macros in this PR (this also creats some minor formatting differences (see Token::Characters(SplitStatus::NotSplit cases) because auto-formatting now applies to these code blocks which it didn't before because of the macro).
  • The syntax in this PR is a little more verbose, but also less magic. The macro_rules macro outputs a pattern which then allows the match block to be a regular rust match rather than being processed by the proc_macro.
  • The "else case" is the other thing that changes here. Rather than the special "else syntax" like </head> </body> </html> </br> => else (which causes cases that match it to fall through to the fallback case) you have a guard like if is_not_tag!(tag, </head> | </body> | </html> | </br>) (or equivalently if !matches!(tag, tag!(</head> | </body> | </html> | </br>)). Again, this is a lot less magic and works like standard rust. But it slightly less directly follows the order of the spec as the guard has to be moved down to the "every start tag" and/or "every end tag case" rather than being specified further up (which the spec does for some reason). We could perhaps leave a comment where the "else" case is in the spec that refers you down to the guard on the later clause?)
  • For example of spec, see https://html.spec.whatwg.org/multipage/parsing.html#the-before-head-insertion-mode for example of spec. Either way, I think we could benefit from copying more of the spec into the rules.rs file.

Thoughts?

P.S. Minor note on syntax: it would also be possible to write TagToken!(tag @ <html>) instead of Token::Tag(tag @ tag!(<html>)) instead of if we preferred.

@simonwuelker
Copy link
Contributor

simonwuelker commented Sep 5, 2025

The "else case" is the other thing that changes here. Rather than the special "else syntax" like
=> else (which causes cases that match it to fall through to the fallback case) you have a guard like if is_not_tag!(tag, | | |
) (or equivalently if !matches!(tag, tag!( | | |
)). Again, this is a lot less magic and works like standard rust. But it slightly less directly follows the order of the spec as the guard has to be moved down to the "every start tag" and/or "every end tag case" rather than being specified further up (which the spec does for some reason). We could perhaps leave a comment where the "else" case is in the spec that refers you down to the guard on the later clause?)

Personally I would prefer to implement functions for the "else" cases (like any_other_tag_in_data_state) and the simply call those in two places.

Other than that, this looks great! I'm all for less magic and less proc macros, even if it ends up being slightly more verbose.

P.S. Minor note on syntax: it would also be possible to write TagToken!(tag @ ) instead of Token::Tag(tag @ tag!()) instead of if we preferred.

Not a big difference but I prefer to keep the macro invocation as small as possible(the current version).

Either way, I think we could benefit from copying more of the spec into the rules.rs file.

Yeah adding proper spec comments is something else that I eventually want to come around too... But it's a dull task (:

@jdm
Copy link
Member

jdm commented Sep 6, 2025

Other than that, this looks great! I'm all for less magic and less proc macros, even if it ends up being slightly more verbose.

Same! I love removing syntax unique to this crate.

Signed-off-by: Nico Burns <[email protected]>
@nicoburns nicoburns changed the title Replace match_token proc macro with macro_rules! macros Replace match_token! proc macro with tag! macro_rules macro Sep 8, 2025
Signed-off-by: Nico Burns <[email protected]>
@nicoburns nicoburns force-pushed the declarative_match_token branch from 5aac3e6 to b0b1b54 Compare September 8, 2025 11:53
@nicoburns nicoburns added the V-non-breaking A non-breaking change label Sep 8, 2025
@nicoburns nicoburns marked this pull request as ready for review September 8, 2025 12:12
@nicoburns
Copy link
Contributor Author

nicoburns commented Sep 8, 2025

Ok, I've rolled this out to the entire rules.rs, cleaned it up, documented the tag! macro, and deleted the match_token crate. I've gone with a closure for the else cases (which actually only occur in 5 of the InsertionModes). Normally I would a function for this, but writing out the generic types (which can't be reused from the outer scope) made that verbose in this case).

I haven't added proper spec comments, but I have added links to the spec for each InsertionMode. As part of this, I noticed two insertion modes ("in select" and "in select in table") which didn't appear to be the spec. Turns out the spec has changed and these modes no longer exist. I've opened #660 to track this.

We'll need a release of string-cache-codegen (servo/string-cache#297) before this can be merged, but otherwise this is ready to go (modulo any further review feedback (now published)

@nicoburns nicoburns added this pull request to the merge queue Sep 8, 2025
Merged via the queue into servo:main with commit 13a0c24 Sep 8, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
V-non-breaking A non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants